Bisimulation Metrics for Continuous Markov Decision Processes
نویسندگان
چکیده
In recent years, various metrics have been developed for measuring the behavioural similarity of states in probabilistic transition systems [Desharnais et al., Proceedings of CONCUR, (1999), pp. 258-273, van Breugel and Worrell, Proceedings of ICALP, (2001), pp. 421-432]. In the context of finite Markov decision processes, we have built on these metrics to provide a robust quantitative analogue of stochastic bisimulation [Ferns et al., Proceedings of UAI, (2004), pp. 162-169] and an efficient algorithm for its calculation [Ferns et al., Proceedings of UAI (2006), pp.174-181]. In this paper, we seek to properly extend these bisimulation metrics to Markov decision processes with continuous state spaces. In particular, we provide the first distance-estimation scheme for metrics based on bisimulation for continuous probabilistic transition systems. Our work, based on statistical sampling and infinite dimensional linear programming is a crucial first step in formally guiding real-world planning, where tasks are usually continuous and highly stochastic in nature, e.g. robot navigation, and often a substitution with a parametric model or crude finite approximation must be made. We show that the optimal value function associated with a discounted infinite-horizon planning task is continuous with respect to metric distances. Thus, our metrics allow one to reason about the quality of solution obtained by replacing one model with another. Alternatively, they may potentially be used directly for state aggregation. An earlier version of this work appears in the doctoral thesis of Norm Ferns [McGill University, (2008)].
منابع مشابه
Metrics for Markov Decision Processes with Infinite State Spaces
We present metrics for measuring state similarity in Markov decision processes (MDPs) with infinitely many states, including MDPs with continuous state spaces. Such metrics provide a stable quantitative analogue of the notion of bisimulation for MDPs, and are suitable for use in MDP approximation. We show that the optimal value function associated with a discounted infinite horizon planning tas...
متن کاملBisimulation Metrics are Optimal Value Functions
Bisimulation is a notion of behavioural equivalence on the states of a transition system. Its definition has been extended to Markov decision processes, where it can be used to aggregate states. A bisimulation metric is a quantitative analog of bisimulation that measures how similar states are from a the perspective of long-term behavior. Bisimulation metrics have been used to establish approxi...
متن کاملBisimulation and Logical Preservation for Continuous-Time Markov Decision Processes
This paper introduces strong bisimulation for continuous-timeMarkov decision processes (CTMDPs), a stochastic model which allows for a nondeterministic choice between exponential distributions, and shows that bisimulation preserves the validity of CSL. To that end, we interpret the semantics of CSL—a stochastic variant of CTL for continuous-time Markov chains—on CTMDPs and show its measure-theo...
متن کاملBisimulation for Markov Decision Processes through Families of Functional Expressions
We transfer a notion of quantitative bisimilarity for labelled Markov processes [1] to Markov decision processes with continuous state spaces. This notion takes the form of a pseudometric on the system states, cast in terms of the equivalence of a family of functional expressions evaluated on those states and interpreted as a real-valued modal logic. Our proof amounts to a slight modification o...
متن کاملTaking It to the Limit: Approximate Reasoning for Markov Processes
We develop a fusion of logical and metrical principles for reasoning about Markov processes. More precisely, we lift metrics from processes to sets of processes satisfying a formula and explore how the satisfaction relation behaves as sequences of processes and sequences of formulas approach limits. A key new concept is dynamically-continuous metric bisimulation which is a property of (pseudo)m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- SIAM J. Comput.
دوره 40 شماره
صفحات -
تاریخ انتشار 2011